-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: limit foreground tasks draining loop #19987
Conversation
Foreground tasks that repost themselves can force the draining loop to run indefinitely long without giving other tasks chance to run. This limits the foreground task draining loop to run only the tasks that were in the tasks queue at the beginning of the loop. Fixes: #19937
src/node_platform.h
Outdated
@@ -66,6 +67,7 @@ class PerIsolatePlatformData : | |||
int unref(); | |||
|
|||
// Returns true iff work was dispatched or executed. | |||
// New tasks that are posted during flushing of the queue are not run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit: Maybe be more specific here and say that they are not run as part of this iteration, rather than implying that they aren’t run at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
hmmm... definitely +1 on this but should this be semver-major? |
There are some CI failures. it's not clear if they are unrelated. |
Not sure but there might be issues with Windows. |
Windows failures seem related. |
Thanks. I'll try to reproduce the windows failure and debug it. |
The windows failure should be fixed now. The problem was caused by the incorrect repost count in the new test. That resulted in one left-over task that tried to access Environment after its destruction. |
sequential/test-inspector-stop-profile-after-done is failing on ubuntu1710-x64 on all three CI jobs. Unfortunately it doesn't reproduce on my machine. I will have to find ubuntu 17.10.
|
@ulan that is a known flake. It is independent from this PR and you can safely ignore it :-) |
Guess this is ready then? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM. CI: https://ci.nodejs.org/job/node-test-pull-request/14369/
Ok, I know that |
@apapirovski yes, that seems strange. Could it be caused by a "unlucky" base revision? (I don't know if CI rebases the patch to the tip of tree for each run). |
After further inspection, I think it might be related because the test uses the sampling heap profiler, which internally relies on V8's allocation observer. Incremental marker also uses the allocation observer. There is probably some timing change that causing the test to fail more frequently. I will investigate. |
Sampling heap profiler theory was incorrect. The test is about CPU profiler. I found a bug in the initial patch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated patch LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for following up on this.
src/node_platform.h
Outdated
// Returns true iff work was dispatched or executed. | ||
// New tasks that are posted during flushing of the queue are postponed until | ||
// the next flushing. | ||
// Returns true iff work was dispatched or executed. New tasks that are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/iff/if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apapirovski I guess these were intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but I know the meaning and I didn't even think of it until you mentioned it... but maybe that just reflects badly on me... 🤔 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm saying: I don't think this is the place for iff
. It doesn't make it clearer.
But I'm not going to object if someone prefers this. It's a low priority thing for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The iff
was there before my change. I agree that if
seems slightly better.
src/node_platform.h
Outdated
@@ -133,7 +133,10 @@ class NodePlatform : public MultiIsolatePlatform { | |||
double CurrentClockTimeMillis() override; | |||
v8::TracingController* GetTracingController() override; | |||
|
|||
void FlushForegroundTasks(v8::Isolate* isolate); | |||
// Returns true iff work was dispatched or executed. New tasks that are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, s/iff/if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There are two windows failures in the recent CI job:
Both failures seem to me unrelated to the PR. |
CI https://ci.nodejs.org/job/node-test-pull-request/14440/ @ulan there are often some flakes. |
Foreground tasks that repost themselves can force the draining loop to run indefinitely long without giving other tasks chance to run. This limits the foreground task draining loop to run only the tasks that were in the tasks queue at the beginning of the loop. PR-URL: #19987 Fixes: #19937 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Khaidi Chu <i@2333.moe>
@apapirovski thanks for landing! |
@@ -316,7 +316,7 @@ class NodeInspectorClient : public V8InspectorClient { | |||
terminated_ = false; | |||
running_nested_loop_ = true; | |||
while (!terminated_ && channel_->waitForFrontendMessage()) { | |||
platform_->FlushForegroundTasks(env_->isolate()); | |||
while (platform_->FlushForegroundTasks(env_->isolate())) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this line preserve the bug? Because, even though you swap the queue before draining it, you're still running this loop forever if new tasks are constantly added to the original queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inspector posts foreground tasks and requires that they are all processed before going to the outer loop.
AFAIK this code runs only when the inspector is active and the program is paused. The normal libuv tasks are not processed here. Latency shouldn't be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the original bug this is trying to fix (#19937) is that there are cases where foreground tasks can add additional tasks to the queue. The bug was fixed by freezing the queue inside of FlushForegroundTasks, but this line of code I'm commenting on appears to loop THAT CALL so that the freeze fix doesn't actually help anything. It still will run forever, if foreground tasks add themselves.
Unless there's more than one place where FlushForegroundTasks
is being called, and that's not an issue for this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is certainly more than one place. This particular line addresses a very specific interaction.
Foreground tasks that repost themselves can force the draining loop to run indefinitely long without giving other tasks chance to run. This limits the foreground task draining loop to run only the tasks that were in the tasks queue at the beginning of the loop. PR-URL: #19987 Fixes: #19937 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Khaidi Chu <i@2333.moe>
Foreground tasks that repost themselves can force the draining loop to run indefinitely long without giving other tasks chance to run. This limits the foreground task draining loop to run only the tasks that were in the tasks queue at the beginning of the loop. PR-URL: nodejs#19987 Fixes: nodejs#19937 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Khaidi Chu <i@2333.moe>
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.
This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.
Fixes: #19937
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes